[CI] improve test partition algorithm for better load balancing#7588
[CI] improve test partition algorithm for better load balancing#7588winson-00178005 wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
Note Gemini is unable to generate a summary for this pull request due to the file types involved not being currently supported. |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
28b5b88 to
dbadbee
Compare
- Enhanced greedy algorithm in run_suite.py partition function - Better load balancing by finding minimum total time bucket - Sort each bucket by estimated time (short tests first) Expected benefits: - Better parallel test execution efficiency - More balanced test distribution across partitions Signed-off-by: hejianping <hejianping7@huawei.com>
dbadbee to
bcdfcde
Compare
|
@Potabk please take a look |
|
Can you explain what's the difference? |
|
Before; python3 .github/workflows/scripts/run_suite.py \
--suite e2e-singlecard \
--auto-partition-id 0 \
--auto-partition-size 2
+----------------+-------------+
| Suite | Partition |
|----------------+-------------|
| e2e-singlecard | 1/2 |
+----------------+-------------+
✅ Enabled 8 test(s) (est. total 7099.0s):
- tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py (est=69s)
- tests/e2e/singlecard/compile/test_norm_quant_fusion.py (est=106s)
- tests/e2e/singlecard/test_ilama_lora.py (est=112s)
- tests/e2e/singlecard/test_xlite.py (est=135s)
- tests/e2e/singlecard/pooling/test_classification.py (est=148s)
- tests/e2e/singlecard/test_camem.py (est=149s)
- tests/e2e/singlecard/test_llama32_lora.py (est=239s)
- tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py (est=6141s)
❌ Skipped 1 test(s) (consider recovering):
- tests/e2e/singlecard/model_runner_v2/test_basic.py
(vllm-scr) vllm-ascend % python3 .github/workflows/scripts/run_suite.py \
--suite e2e-singlecard \
--auto-partition-id 1 \
--auto-partition-size 2
+----------------+-------------+
| Suite | Partition |
|----------------+-------------|
| e2e-singlecard | 2/2 |
+----------------+-------------+
✅ Enabled 22 test(s) (est. total 7082.0s):
- tests/e2e/singlecard/test_auto_fit_max_mode_len.py (est=70s)
- tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py (est=83s)
- tests/e2e/singlecard/test_multi_instance.py (est=120s)
- tests/e2e/singlecard/test_completion_with_prompt_embeds.py (est=136s)
- tests/e2e/singlecard/test_qwen3_multi_loras.py (est=140s)
- tests/e2e/singlecard/test_cpu_offloading.py (est=166s)
- tests/e2e/singlecard/test_aclgraph_mem.py (est=187s)
- tests/e2e/singlecard/test_async_scheduling.py (est=252s)
- tests/e2e/singlecard/test_eager_mode_acc.py (est=255s)
- tests/e2e/singlecard/test_sampler.py (est=258s)
- tests/e2e/singlecard/pooling/test_qwen3_reranker_lora.py (est=280s)
- tests/e2e/singlecard/test_quantization.py (est=284s)
- tests/e2e/singlecard/test_multistream_overlap_shared_expert.py (est=292s)
- tests/e2e/singlecard/test_models.py (est=320s)
- tests/e2e/singlecard/pooling/test_embedding.py (est=324s)
- tests/e2e/singlecard/test_guided_decoding.py (est=407s)
- tests/e2e/singlecard/test_vlm.py (est=495s)
- tests/e2e/singlecard/test_batch_invariant.py (est=506s)
- tests/e2e/singlecard/test_aclgraph_batch_invariant.py (est=515s)
- tests/e2e/singlecard/pooling/test_scoring.py (est=553s)
- tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py (est=600s)
- tests/e2e/singlecard/test_aclgraph_accuracy.py (est=839s)
❌ Skipped 1 test(s) (consider recovering):
- tests/e2e/singlecard/model_runner_v2/test_basic.pyafter python3 .github/workflows/scripts/run_suite.py \
--suite e2e-singlecard \
--auto-partition-id 0 \
--auto-partition-size 2
+----------------+-------------+
| Suite | Partition |
|----------------+-------------|
| e2e-singlecard | 1/2 |
+----------------+-------------+
✅ Enabled 8 test(s) (est. total 7099.0s):
- tests/e2e/singlecard/compile/test_graphex_qknorm_rope_fusion.py (est=69s)
- tests/e2e/singlecard/compile/test_norm_quant_fusion.py (est=106s)
- tests/e2e/singlecard/test_ilama_lora.py (est=112s)
- tests/e2e/singlecard/test_xlite.py (est=135s)
- tests/e2e/singlecard/pooling/test_classification.py (est=148s)
- tests/e2e/singlecard/test_camem.py (est=149s)
- tests/e2e/singlecard/test_llama32_lora.py (est=239s)
- tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py (est=6141s)
❌ Skipped 1 test(s) (consider recovering):
- tests/e2e/singlecard/model_runner_v2/test_basic.py
(vllm-scr) vllm-ascend % python3 .github/workflows/scripts/run_suite.py \
--suite e2e-singlecard \
--auto-partition-id 1 \
--auto-partition-size 2
+----------------+-------------+
| Suite | Partition |
|----------------+-------------|
| e2e-singlecard | 2/2 |
+----------------+-------------+
✅ Enabled 22 test(s) (est. total 7082.0s):
- tests/e2e/singlecard/test_auto_fit_max_mode_len.py (est=70s)
- tests/e2e/singlecard/compile/test_graphex_norm_quant_fusion.py (est=83s)
- tests/e2e/singlecard/test_multi_instance.py (est=120s)
- tests/e2e/singlecard/test_completion_with_prompt_embeds.py (est=136s)
- tests/e2e/singlecard/test_qwen3_multi_loras.py (est=140s)
- tests/e2e/singlecard/test_cpu_offloading.py (est=166s)
- tests/e2e/singlecard/test_aclgraph_mem.py (est=187s)
- tests/e2e/singlecard/test_async_scheduling.py (est=252s)
- tests/e2e/singlecard/test_eager_mode_acc.py (est=255s)
- tests/e2e/singlecard/test_sampler.py (est=258s)
- tests/e2e/singlecard/pooling/test_qwen3_reranker_lora.py (est=280s)
- tests/e2e/singlecard/test_quantization.py (est=284s)
- tests/e2e/singlecard/test_multistream_overlap_shared_expert.py (est=292s)
- tests/e2e/singlecard/test_models.py (est=320s)
- tests/e2e/singlecard/pooling/test_embedding.py (est=324s)
- tests/e2e/singlecard/test_guided_decoding.py (est=407s)
- tests/e2e/singlecard/test_vlm.py (est=495s)
- tests/e2e/singlecard/test_batch_invariant.py (est=506s)
- tests/e2e/singlecard/test_aclgraph_batch_invariant.py (est=515s)
- tests/e2e/singlecard/pooling/test_scoring.py (est=553s)
- tests/e2e/singlecard/spec_decode/test_v1_spec_decode.py (est=600s)
- tests/e2e/singlecard/test_aclgraph_accuracy.py (est=839s)
❌ Skipped 1 test(s) (consider recovering):
- tests/e2e/singlecard/model_runner_v2/test_basic.py |
8849bf7 to
bcdfcde
Compare
Thanks for the observation! The skipped status of
|
The key difference is that the improved algorithm sorts tests within each partition by execution time (short tests first) for better execution efficiency. |
Expected benefits:
What this PR does / why we need it?
Improves test partition algorithm in CI workflow for better load balancing across parallel test executions. The current greedy algorithm uses simple round-robin assignment which leads to significant imbalance when test execution times vary widely.
Changes:
Performance Improvements (verified with e2e-singlecard suite):
Partition Count Range Reduction
2 partitions 99.1%
4 partitions 56.9%
8 partitions 37.6%
Expected Benefits:
Does this PR introduce any user-facing change?
No. This is an internal CI optimization that affects only test execution distribution and timing.
How was this patch tested?
The verification script demonstrates significant improvements in load balancing metrics:
For 4-partition setup: range reduced by 56.9%, standard deviation reduced by 58.5%
For 8-partition setup: range reduced by 37.6%, standard deviation reduced by 46.8%
Theoretical speedup of 1.309x for multi-partition scenarios
vLLM version: v0.18.0
vLLM main: vllm-project/vllm@ed359c4